NVIDIA Unveils Advanced Riva TTS Models for Multilingual Speech Synthesis and Voice Cloning
NVIDIA has launched its latest Riva TTS models, Magpie TTS Multilingual, Magpie TTS Zeroshot, and Magpie TTS Flow, marking a significant leap in text-to-speech technology. These models feature a streaming encoder-decoder transformer architecture, delivering high-quality, natural-sounding speech across multiple languages.
The Magpie TTS Multilingual model supports English, Spanish, French, and German, targeting applications like multilingual IVR systems and digital human interactions. Meanwhile, Magpie TTS Zeroshot and Magpie TTS FLOW focus on English, with use cases spanning live telephony, gaming NPCs, and podcast narration.
NVIDIA's non-autoregressive encoder and autoregressive decoder architecture ensures preference alignment, setting a new standard for AI-driven speech synthesis. The technology is poised to revolutionize industries reliant on voice interfaces, from customer service to entertainment.